b'TREASURY INSPECTOR GENERAL FOR TAX ADMINISTRATION\n\n\n\n\n                     Disaster Recovery Testing Is Being\n                     Adequately Performed, but Problem\n                   Reporting and Tracking Can Be Improved\n\n\n\n                                            May 3, 2012\n\n                              Reference Number: 2012-20-041\n\n\n\n\n This report has cleared the Treasury Inspector General for Tax Administration disclosure review process\n  and information determined to be restricted from public release has been redacted from this document.\n\n\n\n Phone Number | 202-622-6500\n E-mail Address | TIGTACommunications@tigta.treas.gov\n Website        | http://www.tigta.gov\n\x0c                                                HIGHLIGHTS\n\n\nDISASTER RECOVERY TESTING IS                        WHAT TIGTA FOUND\nBEING ADEQUATELY PERFORMED, BUT\n                                                    The IRS is adequately planning and conducting\nPROBLEM REPORTING AND TRACKING                      disaster recovery tests of critical current\nCAN BE IMPROVED                                     production environment computer systems and\n                                                    is performing disaster recovery exercises and\n                                                    tests on the Customer Account Data Engine 2\nHighlights                                          system as it is being developed.\n\nFinal Report issued on May 3, 2012                  However, the IRS can improve disaster recovery\n                                                    test problem reporting and tracking. TIGTA\nHighlights of Reference Number: 2012-20-041         found that problem tickets used by the IRS for\nto the Internal Revenue Service Chief               identifying, resolving, and tracking problems\nTechnology Officer.                                 encountered during the tests were not created\n                                                    for several problems. In addition, reports\nIMPACT ON TAXPAYERS                                 prepared by the IRS during the disaster recovery\n                                                    tests used to track the progress and problems it\nDisaster recovery planning is a coordinated         encountered in recovering systems did not have\nstrategy for recovering computer systems            complete information on many of the processes\nfollowing a disruption. By testing disaster         run and problems identified during the tests.\nrecovery plans, recovery problems can be            Finally, the IRS did not have a process for\nidentified and corrected before an actual           closely and formally tracking the implementation\ndisruption occurs. The IRS is adequately            of the less serious recommendations it made at\nplanning and conducting disaster recovery tests,    the conclusion of the disaster recovery tests.\nbut IRS reporting of problems identified during     During the course of the audit, TIGTA auditors\nthe tests and the tracking of progress to           informed the IRS of the need to track these\nimplement recommendations made at the               recommendations, and the IRS recently\nconclusion of the tests need to be improved.        developed a tracking worksheet.\nEffective disaster recovery capabilities are\ncritical to ensure that key information systems     WHAT TIGTA RECOMMENDED\ncan be recovered with minimal disruption to the\ncritical IRS business processes they support.       TIGTA recommended that the Associate Chief\nThe data and services provided by these             Information Officer, Cybersecurity, 1) revise\nsystems are also needed by Congress, the            reports the IRS prepares during disaster\nDepartment of the Treasury, tax professionals,      recovery tests to include required entries for\ntaxpayers, and other Government agencies.           references to problem tickets and 2) create a\n                                                    process for reviewing the completeness of\nWHY TIGTA DID THE AUDIT                             problem tickets and reports prepared during the\n                                                    tests to help ensure that they contain complete\nDuring this audit, TIGTA observed and/or            information.\nreviewed IRS disaster recovery tests. The IRS\nis required to conduct disaster recovery tests on   In its response to the report, the IRS agreed with\nits most critical computer systems. Disaster        TIGTA\xe2\x80\x99s recommendations. The IRS 1) revised\nrecovery testing is conducted to test the IRS\xe2\x80\x99s     its disaster recovery test reports to require\nability to recover major computer systems at one    entries for references to problem tickets and\nComputing Center to another Computing Center.       2) created a process for reviewing the\nThis review was requested by the Cybersecurity      completeness of problem tickets and reports.\norganization and is also part of our statutory\nrequirements to annually review the adequacy\nand security of IRS technology.\n\x0c                                                DEPARTMENT OF THE TREASURY\n                                                    WASHINGTON, D.C. 20220\n\n\n\n\nTREASURY INSPECTOR GENERAL\n  FOR TAX ADMINISTRATION\n\n\n\n\n                                                 May 3, 2012\n\n\n MEMORANDUM FOR CHIEF TECHNOLOGY OFFICER\n\n FROM:                           Michael R. Phillips\n                                 Deputy Inspector General for Audit\n\n SUBJECT:                        Final Audit Report \xe2\x80\x93 Disaster Recovery Testing Is Being Adequately\n                                 Performed, but Problem Reporting and Tracking Can Be Improved\n                                 (Audit # 201120024)\n\n This report presents the results of our review of disaster recovery testing activities. The overall\n objective was to observe Internal Revenue Service (IRS) disaster recovery testing to determine\n whether the IRS is adequately testing its capability to recover major computer systems from one\n Computing Center to another and whether systems can be successfully recovered. This review\n was requested by the Cybersecurity organization.1 This review addresses the major management\n challenge of Security for Taxpayer Data and Employees, is part of our statutory requirements to\n annually review the adequacy and security of IRS technology, and is included in our Fiscal Year\n 2012 Annual Audit Plan.\n Management\xe2\x80\x99s complete response to the draft report is included as Appendix V.\n Copies of this report are also being sent to the IRS managers affected by the report\n recommendations. Please contact me at (202) 622-6510 if you have questions or Alan Duncan,\n Assistant Inspector General for Audit (Security and Information Technology Services), at\n (202) 622-5894.\n\n\n\n\n 1\n     See Appendix IV for a glossary of terms.\n\x0c                          Disaster Recovery Testing Is Being Adequately Performed,\n                            but Problem Reporting and Tracking Can Be Improved\n\n\n\n\n                                            Table of Contents\n\nBackground .......................................................................................................... Page 1\n\nResults of Review ............................................................................................... Page 3\n          Disaster Recovery Tests Are Being Adequately Planned\n          and Conducted, and Exercises and Tests Were Performed\n          During the Development of the Customer Account Data\n          Engine 2 System ........................................................................................... Page 3\n          Disaster Recovery Test Problem Reporting and Tracking\n          Can Be Improved .......................................................................................... Page 7\n                    Recommendations 1 and 2: .............................................. Page 11\n\n\nAppendices\n          Appendix I \xe2\x80\x93 Detailed Objective, Scope, and Methodology ........................ Page 12\n          Appendix II \xe2\x80\x93 Major Contributors to This Report ........................................ Page 14\n          Appendix III \xe2\x80\x93 Report Distribution List ....................................................... Page 15\n          Appendix IV \xe2\x80\x93 Glossary of Terms ................................................................ Page 16\n          Appendix V \xe2\x80\x93 Management\xe2\x80\x99s Response to the Draft Report ....................... Page 18\n\x0c         Disaster Recovery Testing Is Being Adequately Performed,\n           but Problem Reporting and Tracking Can Be Improved\n\n\n\n\n                      Abbreviations\n\nCADE 2          Customer Account Data Engine 2\nIMF             Individual Master File\nIRS             Internal Revenue Service\nITAMS           Information Technology Asset Management System\nKISAM           Knowledge, Incident/Problem, Service and Asset Management\nNIST            National Institute of Standards and Technology\n\x0c                       Disaster Recovery Testing Is Being Adequately Performed,\n                         but Problem Reporting and Tracking Can Be Improved\n\n\n\n\n                                              Background\n\nThe ability of the Internal Revenue Service (IRS) to carry out its mission and provide key\ntaxpayer service and enforcement operations is heavily dependent on an extensive network of\ncomputer systems spread across the country. During Fiscal Year1 2010, the IRS reported that its\ncomputer systems processed more than 230 million returns, provided more than $467 billion in\nrefunds, collected more than $2.3 trillion in taxes (93 percent of the Federal Government\xe2\x80\x99s\nreceipts), received more than 305 million visits to its websites, and received more than\n98 million electronically filed individual income tax returns. In addition to the IRS needing\nthese systems to administer the Nation\xe2\x80\x99s tax system, data and services provided by these systems\nare needed by Congress, the Department of the Treasury, tax professionals, taxpayers, and other\nGovernment agencies.\nSignificant events, such as the terrorist attacks on September 11, 2001, and Hurricane Katrina in\nAugust 2005, have emphasized the need for organizations to have plans in place that will ensure\nessential operations can continue during a wide range of emergencies. Disaster recovery is an\norganization\xe2\x80\x99s ability to respond to a disruption in services by implementing a plan to restore\ncritical business functions within the stated disaster recovery goals. Disaster recovery planning2\nis a coordinated strategy involving plans, procedures, and technical measures that enable the\nrecovery of information systems, computer operations, and data. If the IRS does not sufficiently\ntest disaster recovery plans in accordance with policies and guidance, the risk increases that\ncritical systems and the business processes supported by these systems may not be successfully\nrecovered in a timely manner after a disruption. This would severely impact the ability of the\nIRS to carry out its mission. Testing of disaster recovery capabilities is a way of identifying\ndeficiencies in disaster recovery plans, procedures, and training. By effectively testing disaster\nrecovery plans, problems can be identified and corrected before an actual disruption occurs.\nThe Federal Information Security Management Act of 20023 and Office of Management and\nBudget mandates require agencies to establish an information technology disaster recovery\nplanning and testing program to ensure that computer systems can be recovered in a timely\nmanner after a disruption. Pursuant to its responsibilities under the Federal Information Security\nManagement Act, the National Institute of Standards and Technology (NIST) developed\nstandards and guidelines that Federal agencies are required to use in developing, conducting, and\nevaluating disaster recovery tests. The Department of the Treasury requires bureaus to develop\n\n\n1\n  See Appendix IV for a glossary of terms.\n2\n  Information technology disaster recovery planning is also referred to as contingency planning. Because\nuniversally accepted definitions are not available, throughout this report we used the term disaster recovery.\n3\n  44 U.S.C. \xc2\xa7\xc2\xa7 3541 \xe2\x80\x93 3549.\n                                                                                                                 Page 1\n\x0c                      Disaster Recovery Testing Is Being Adequately Performed,\n                        but Problem Reporting and Tracking Can Be Improved\n\n\n\nand implement a robust, cost-effective information technology security program that includes\ndisaster recovery testing.\nNIST and IRS policies require the IRS to conduct disaster recovery testing on its most critical\ncomputer systems, while less critical systems receive less rigorous disaster recovery exercises.\nDisaster recovery testing is conducted in as close to an operational environment as possible using\ncomponents or systems used to conduct daily operations. Disaster recovery testing is designed to\nevaluate the IRS\xe2\x80\x99s readiness to cutover, relocate, restore, or rebuild its major systems and\napplications operating at one Computing Center to another Computing Center. To plan for a\ndisaster recovery test, objectives and scope are defined and checklists, test plans, and other test\ndocumentation materials are developed. As the test is conducted, observations, notes, and forms\nare completed. At the end of a disaster recovery test, results are recorded, lessons learned are\ndocumented, corrective action plans are initiated, and disaster recovery plans are updated.\nIn a previous audit,4 we reviewed the IRS\xe2\x80\x99s progress in completing its corrective actions on the\nseven components of its disaster recovery material weakness5 and determined that corrective\nactions for the component on exercising and testing disaster recovery plans were being\nadequately completed. During this disaster recovery testing audit, we observed and/or reviewed\nIRS disaster recovery tests that took place in July, August, and October 2011.\nThis review was performed at the Cybersecurity organization\xe2\x80\x99s Disaster Recovery Testing\nExercise and Evaluation offices in Martinsburg, West Virginia, and Memphis, Tennessee, during\nthe period August 2011 through January 2012. We conducted this performance audit in\naccordance with generally accepted government auditing standards. Those standards require that\nwe plan and perform the audit to obtain sufficient, appropriate evidence to provide a reasonable\nbasis for our findings and conclusions based on our audit objective. We believe that the\nevidence obtained provides a reasonable basis for our findings and conclusions based on our\naudit objective. Detailed information on our audit objective, scope, and methodology is\npresented in Appendix I. Major contributors to the report are listed in Appendix II.\n\n\n\n\n4\n  Treasury Inspector General for Tax Administration, Ref. No. 2011-20-060, Corrective Actions to Address the\nDisaster Recovery Material Weakness Are Being Completed (June 2011).\n5\n  In March 2005, the IRS declared its disaster recovery program a material weakness in accordance with the Federal\nManagers\xe2\x80\x99 Financial Integrity Act of 1982 [31 U.S.C. \xc2\xa7\xc2\xa7 1105, 1113, 3512 (2000)]. The Federal Managers\xe2\x80\x99\nFinancial Integrity Act requires each Federal agency to conduct annual evaluations of its systems of internal\naccounting and administrative control. Each agency is also required to prepare an annual report for Congress and\nthe President that identifies material weaknesses and the agency\xe2\x80\x99s corrective action plans and schedules.\n                                                                                                           Page 2\n\x0c                        Disaster Recovery Testing Is Being Adequately Performed,\n                          but Problem Reporting and Tracking Can Be Improved\n\n\n\n\n                                   Results of Review\n\nDisaster Recovery Tests Are Being Adequately Planned and\nConducted, and Exercises and Tests Were Performed During the\nDevelopment of the Customer Account Data Engine 2 System\nNIST and IRS disaster recovery testing policies and requirements cite the need to adequately\nplan and conduct disaster recovery tests and to perform disaster recovery exercises and tests as\nsystems are being developed. The IRS is adequately planning and conducting disaster recovery\ntests of critical current production environment computer systems and is performing disaster\nrecovery exercises and tests on the Customer Account Data Engine 2 (CADE 2) system as it is\nbeing developed.\n\nDisaster recovery capability of critical current production environment computer\nsystems is being adequately tested\nNIST Special Publication 800-84, Guide to Test, Training, and Exercise Programs for\nInformation Technology Plans and Capabilities, cites the need to adequately plan disaster\nrecovery tests. During the planning phase, the disaster recovery test is designed and test\ndocumentation is prepared. Appropriate planning meetings are held, the scope and objectives are\nestablished, specific tests and test cases are developed, the use of tools is determined, and test\nplans and guides are developed. If feasible, the plan should require that the test be done using\ncomponents or systems used to conduct daily operations. To help plan the disaster recovery test,\nthe IRS has created a Disaster Recovery Test Plan Template. This template contains the test\xe2\x80\x99s\ngeneral objectives; general information such as the dates, call in number, and scenario; the\nresults of any planning meetings that were held; test schedule and test scope; recovery time\nobjectives; test participants; and responsibilities for test execution and summarization. To help\nplan the test cases, the IRS has created the Test Case Daily Action Report Template, which is\nprepared for each system that will be tested. This template contains the overall test objectives\nfor each system to be recovered, the specific steps that will be performed for each system, and\ntest cases for each system specifying equipment and personnel needs, goals, success criteria, and\ndeliverables.\nNIST Special Publication 800-84 and Internal Revenue Manual 10.8.62, Information Technology\nContingency Plan and Disaster Recovery Testing, Training, and Exercise Program,6 cite the\nneed to adequately conduct disaster recovery tests. Disaster recovery tests begin with a scenario\n\n\n6\n    IRM 10.8.62 (Feb. 28, 2009).\n                                                                                           Page 3\n\x0c                         Disaster Recovery Testing Is Being Adequately Performed,\n                           but Problem Reporting and Tracking Can Be Improved\n\n\n\ncontaining the cause of the disaster, the systems and disaster recovery plans to be activated, the\nname of the alternate facility, the unavailability of staff at the damaged site, and other important\ninformation and rules on conducting the test. IRS test leaders are required to conduct at least two\nstatus meetings with the recovery staff each day, and independent observers are assigned to\nreview and record recovery activities. Only the latest versions of disaster recovery plans are to\nbe used to recover the systems. Staff from the production site should not be allowed to assist\nwith recovery at the alternate site during a disaster recovery test because in a worst case real\ndisaster those persons might not be available. Lessons learned from previous tests are built into\nsubsequent tests to both improve the recovery and the testing process. As the test is conducted,\ndisaster recovery staff enters results of the testing into the Test Case Daily Action Report.\nWe determined the disaster recovery tests conducted in July, August, and October 2011 were\nadequately planned. Specifically, we found:\n       \xef\x82\xb7   Disaster Recovery Test Plans were properly prepared with the test\xe2\x80\x99s objectives, scope,\n           systems, scenario, instructions, recovery time objectives, and other necessary\n           information.\n       \xef\x82\xb7   Test Case Daily Action Reports7 were populated with the recovery activities, test cases,\n           and job runs that will be tracked during the test with start and stop times.\n       \xef\x82\xb7   Systems were to be restored on actual production systems at the recovery site.\nWe also determined testing procedures were adequately followed for the August and\nOctober 2011 disaster recovery tests. Specifically, we found:\n       \xef\x82\xb7   In the August 2011 test, the disaster declaration, rules, and instructions were announced.\n           Because the October 2011 test was limited in scope, a disaster declaration was not\n           required.\n       \xef\x82\xb7   The daily status meetings reviewed progress and discussed and resolved problems.\n       \xef\x82\xb7   The Disaster Recovery Test Plan and the Test Case Daily Action Report were used to\n           track process and job runs and also to ensure test objectives for each system were\n           fulfilled.\n       \xef\x82\xb7   Other testing requirements were adhered to, such as using only the latest copy of the\n           disaster recovery plan, not allowing staff at the \xe2\x80\x9cdisaster site\xe2\x80\x9d to participate in any\n           recovery activity unless stipulated in the disaster recovery test scenarios, focusing on the\n           lessons learned from the previous disaster recovery test, and Cybersecurity organization\n           staff acting as an independent observer and recorder.\n\n\n\n7\n    See definition on page 8.\n                                                                                                 Page 4\n\x0c                   Disaster Recovery Testing Is Being Adequately Performed,\n                     but Problem Reporting and Tracking Can Be Improved\n\n\n\nGenerally, systems and applications were successfully recovered within recovery time objectives\nduring the July and August 2011 tests. However, the recovery of a mainframe computer\nexperienced significant problems during the July 2011 test. Because backup tapes had not been\nmade, critical database subsystems could not be recovered until new backup tapes were created\nby the disaster site during the test. Five application databases were not recoverable, and much of\nthe batch processing could not be performed.\nThe October 2011 test was a limited test that focused on the problems that occurred in recovering\nthe mainframe computer in the July 2011 test. The October 2011 test resulted in significant\nimprovements to the problems experienced during the July 2011 test.\n\nCADE 2 disaster recovery exercises and tests are being performed as the system\nis being developed\nNIST Special Publication 800-34, Contingency Planning Guide for Federal Information Systems,\nrequires that initial disaster recovery exercises and tests be performed during the implementation\nphase of the Software Development Life Cycle to validate Information System Contingency Plan\nrecovery procedures.\nDisaster recovery exercises consist of Table Top Exercises and Functional Exercises. Table Top\nExercises are discussion-based (walkthrough) exercises that do not involve deploying or\nrecovering systems, equipment, or resources. Personnel meet to discuss their roles during an\nemergency and their responses to a particular emergency. The participants validate the content\nof the plan and related policies and procedures in the context of a particular emergency situation.\nFunctional Exercises are more extensive than Table Top Exercises, requiring the event to be\nsimulated. The exercises are designed to test procedures and assets involved in one or more\nfunctional aspects of the disaster recovery plan, such as backup retrieval, reading backup data,\nand validation of off-site storage.\nDisaster recovery tests are conducted in as close to an operational environment as possible using\ncomponents or systems used to conduct daily operations. The scope of testing can range from\nindividual system components or systems to comprehensive tests of all systems and components\nthat support a disaster recovery plan. These tests are designed to evaluate IRS readiness to\ncutover, relocate, restore, or build IRS systems. Disaster recovery tests involve activities such as\ncutovers from one platform or system to another, relocation of systems, or recovery of platforms\nand their hosted applications.\nThe CADE 2 is a vital IRS modernization effort and foundational component of the IRS\xe2\x80\x99s\ntechnology strategy that builds on the foundation of the current CADE. It is one of the IRS\xe2\x80\x99s top\npriority information technology investments. Its successful implementation is essential to reach\nthe IRS\xe2\x80\x99s vision for tax administration. The CADE 2 will provide state-of-the-art individual\ntaxpayer account processing and technologies to improve service to taxpayers and enhance IRS\ntax administration capabilities. It will provide faster refunds for millions of individual taxpayers\nand faster payment postings, account updates, and taxpayer notices. The CADE 2 will integrate\n                                                                                             Page 5\n\x0c                    Disaster Recovery Testing Is Being Adequately Performed,\n                      but Problem Reporting and Tracking Can Be Improved\n\n\n\nthe CADE and the Individual Master File (IMF) into a single taxpayer processing system. It will\nalso provide a single database that will improve user access to accurate and timely data. The\nCADE 2 will be implemented in phases. The first phase, called Transition State 1, is scheduled\nto be delivered for the 2012 Filing Season and will implement daily IMF processing. Daily\nprocessing will provide more accurate, timely data to frontline IRS employees and is expected to\nallow the IRS to update and settle accounts more quickly.\nThe IRS has developed a draft CADE 2 Program Disaster Recovery Design Plan. This\ndocument shows CADE 2 disaster recovery logical design, infrastructure, cost and sizing\nestimates, and other disaster recovery considerations. The CADE 2 has nine core components,\nand the IRS will be implementing two new types of disaster recovery technologies for six of\nthem. These two new technologies are Virtual Tape Replication and Storage-Based\nAsynchronous Replication.\nVirtual Tape Replication, instead of traditional backup tape, will be used because the CADE 2\nIMF Daily Processing core component will process IMF data daily instead of weekly. To\naccommodate the backing up of IMF files that are processed daily, the IRS will implement a\ndisaster recovery solution using International Business Machines Corporation\xe2\x80\x99s virtual tape\nreplication product called gridding. Gridding is a technology that makes it possible to save data\nas if it were being stored on tape, although it is actually being stored on hard disk or another\nmedium. Each day, gridding will transmit over the network IMF daily processing data from the\nVirtual Tape Library in the Martinsburg Computing Center to the Memphis Computing Center\nrecovery site.\nInternational Business Machines Corporation\xe2\x80\x99s Global Mirror storage-based asynchronous data\nreplication is a solution that the IRS will use to replicate production data that will be transferred\nover the network from the Martinsburg Computing Center to the Memphis Computing Center\nrecovery site for six CADE 2 core components. Asynchronous replication is a technique for\nreplicating data between file systems in which the system being replicated can continue to be\nchanged without having to wait for the remote system to have recorded changes previously\ntransmitted by the system being replicated. An example of a CADE 2 core component that will\nbe replicated using Global Mirror is the CADE 2 Database Implementation, which contains IMF\ndata. The entire Martinsburg Computing Center CADE 2 database will initially be replicated to\nthe Memphis Computing Center recovery site, after which only changes made to the database\nwill be replicated on a daily basis.\nThe IRS has performed disaster recovery exercises and tests on the gridding and Global Mirror\ndisaster recovery solutions and reported on the results of these exercises and tests in the draft\nCADE 2 Disaster Recovery Testing Reports Overview for Transition State 1, Enterprise Life\nCycle Milestone 4b. The main purpose of these exercises and tests was to verify, prior to the\nimplementation of the CADE 2, that CADE 2 disaster recovery solutions are ready for use.\nDisaster recovery exercise and test results in the IRS\xe2\x80\x99s report have confirmed the ability of these\ntwo solutions to successfully replicate production data from the Martinsburg Computing Center\n\n                                                                                              Page 6\n\x0c                   Disaster Recovery Testing Is Being Adequately Performed,\n                     but Problem Reporting and Tracking Can Be Improved\n\n\n\nto the Memphis Computing Center recovery site. The following disaster recovery exercises and\ntests were performed (in order of complexity).\n   \xef\x82\xb7   On October 20, 2011, the IRS performed a Table Top Exercise on gridding and Global\n       Mirror replication procedures for recovering the Martinsburg Computing Center IMF and\n       mainframe systems in the Memphis Computing Center recovery site. The exercise\n       identified changes needed to disaster recovery procedures and other action items.\n   \xef\x82\xb7   In September 2011, the IRS performed a Functional Exercise on gridding replication.\n       The exercise confirmed that virtual tape files initially sent and update files subsequently\n       sent by the gridding solution in the Martinsburg Computing Center were successfully\n       received by the gridding solution in the Memphis Computing Center recovery site.\n   \xef\x82\xb7   On October 7, 2011, the IRS completed a Functional Exercise on Global Mirror\n       replication. The exercise confirmed that test volumes for the Martinsburg Computing\n       Center IMF and mainframe systems initially sent and update files subsequently sent by\n       Global Mirror in the Martinsburg Computing Center were successfully received by the\n       Memphis Computing Center recovery site.\n   \xef\x82\xb7   On November 4, 2011, the IRS completed a Disaster Recovery Test on gridding and\n       Global Mirror replication. The test confirmed the ability to restore the Martinsburg\n       Computing Center IMF and mainframe systems in the Memphis Computing Center\n       recovery site using gridding and Global Mirror file replication.\nThe IRS has also performed an initial disaster recovery test of the CADE 2 database as it was\nbeing loaded. In late November 2011, the IRS performed a test which confirmed that the files\nwere backed up and reconciled. The IRS reported that the test successfully compared production\nfiles to recovery files and matched record counts to confirm the backed up files.\nThe IRS is planning to conduct a Computing Center to Computing Center disaster recovery test\nin Calendar Year 2012, at which time the CADE 2 production system in the Martinsburg\nComputing Center will be tested for recovery in the Memphis Computing Center recovery site.\n\nDisaster Recovery Test Problem Reporting and Tracking Can Be\nImproved\nNIST, Department of the Treasury, and IRS disaster recovery testing policies and requirements\ncite the need to report on testing problems and the status of testing processes and to track\nrecommendations to improve disaster recovery capabilities. The IRS can improve the following\nareas of disaster recovery test problem reporting and tracking during the execution phase of\ndisaster recovery tests.\n   \xef\x82\xb7   Problem tickets used by the IRS for identifying, resolving, and tracking problems\n       encountered during the tests were not created for many problems.\n\n                                                                                             Page 7\n\x0c                   Disaster Recovery Testing Is Being Adequately Performed,\n                     but Problem Reporting and Tracking Can Be Improved\n\n\n\n   \xef\x82\xb7   Reports prepared by the IRS during the disaster recovery tests to track the progress and\n       problems encountered in recovering systems did not have complete information on many\n       of the problems and the processes run during the test.\n   \xef\x82\xb7   The IRS did not have a process for closely and formally tracking the implementation of\n       the less serious recommendations made at the conclusion of the disaster recovery tests.\n\nInformation Technology Asset Management System (ITAMS) ticket reporting and\nTest Case Daily Action Reports can be improved\nNIST Special Publication 800-84 states that reporting on disaster recovery testing should\ndetermine how well tested systems or components functioned. The introduction to the disaster\nrecovery testing report should document background information about the test, such as the\nscope, objectives, and tests. The report should also document observations made by the test team\nduring the test, lessons learned during the test, and recommendations for enhancing the disaster\nrecovery plan that had its components or systems tested, along with associated procedures and\ncomponents.\nInternal Revenue Manual 10.8.62 requires that several reports be completed before, during, and\nafter disaster recovery testing. These reports evaluate disaster recovery test results and identify\nweaknesses and corrective actions to improve IRS preparedness.\n   1. The Test Case Daily Action Report contains the overall disaster recovery test objectives\n      for each system to be recovered; the specific steps that will be performed for each\n      system; test cases for each system specifying equipment and personnel needs, goals,\n      success (outcome) criteria, and deliverables; and the designation of the disaster recovery\n      site executive as the person to decide an early termination and termination criteria. A\n      completed Test Case Daily Action Report should indicate the start and stop time of each\n      process, file, or job run; whether it was completed with or without interruption; a\n      description of the interruption; a description of the corrective action used to complete the\n      interruption; and whether a problem encountered in performing a test case required an\n      update to the disaster recovery plan. The Test Case Daily Action Report is prepared by\n      Enterprise Operations organization staff during the disaster recovery test.\n   2. The Vulnerabilities Matrix Report is prepared by the Cybersecurity organization and\n      contains information on the problems that were reported in ITAMS tickets during the\n      disaster recovery test. The ITAMS is a centralized database for incident management.\n      The IRS requires that ITAMS tickets be created for all problems that occur during a\n      disaster recovery test so that problems can be properly identified, resolved, and tracked,\n      and disaster recovery plans can be revised if needed. The IRS also requires that ITAMS\n      tickets created during a disaster recovery test contain detailed problem and problem\n      resolution descriptions. Disaster scenario scripts that the IRS uses during disaster\n      recovery tests stress that ITAMS tickets must contain detailed problems and resolutions.\n\n                                                                                              Page 8\n\x0c                   Disaster Recovery Testing Is Being Adequately Performed,\n                     but Problem Reporting and Tracking Can Be Improved\n\n\n\n       The ITAMS was replaced with the Knowledge, Incident/Problem, Service and Asset\n       Management system (KISAM) on October 1, 2011.\n   3. The Detailed Daily Observation Report is prepared by the Cybersecurity organization\n      and contains detailed recordings of the daily status meetings that take place during the\n      disaster recovery test. It also contains other observations.\n   4. The Executive Overview Report is prepared by the Cybersecurity organization and\n      contains the disaster recovery test\xe2\x80\x99s scenario, overall recovery results, recovery\n      directives, test objectives, scope, accomplishments, findings, recommendations, and\n      actions needed to correct weaknesses to specific computer systems.\nThe Detailed Daily Observations Report for the July 2011 disaster recovery test and the Draft\nDetailed Daily Observations Report for the August 2011 test contained substantial notes on the\ndaily status meetings that took place on each day of the recovery, covered the discussions on\neach system that was being recovered, and contained Cybersecurity organization observations.\nThe Executive Overview Report for the July 2011 disaster recovery test and the Draft Executive\nOverview Report for the August 2011 test contained key information on the test\xe2\x80\x99s scope,\nobjectives, tests, observations, lessons learned/findings, and recommendations, as suggested in\nNIST Special Publication NIST 800-84.\nWhile the IRS prepared 46 ITAMS tickets during the August and October 2011 disaster recovery\ntests, tickets were not prepared for 10 problems. For eight of the 10 problems, the Test Case\nDaily Action Report indicated a problem, but an ITAMS ticket had not been prepared.\nMany problems reported in Test Case Daily Action Reports and ITAMS tickets shown in the\nVulnerabilities Matrix and Shift Turnover Reports for the July and August 2011 tests did not\nadequately describe the problems or interruptions that occurred or describe how the problems or\ninterruptions were resolved. For example, the IRS prepared 105 ITAMS tickets during these two\ndisaster recovery tests, but 20 tickets did not contain adequate descriptions of the problems or\ninterruptions or describe how they were resolved. Sixty-four problems reported in the Test Case\nDaily Action Reports for these two tests did not contain adequate descriptions of the problems or\ninterruptions or describe how they were resolved.\nOther sections of the Test Case Daily Action Reports for the July and August 2011 disaster\nrecovery tests were also incomplete. For 178 of the processes and jobs listed in the Test Case\nDaily Action Reports, there was no indication whether the processes or jobs were or were not\ncompleted or were completed with or without an interruption. For 111 of the processes or jobs\nlisted in the Test Case Daily Action Reports that encountered a problem, there was no indication\nwhether an update to the disaster recovery plan was or was not needed.\nITAMS tickets were not prepared and ITAMS tickets and Test Case Daily Action Reports were\nincomplete because they are prepared during the disaster recovery test. At this time, the\ntechnical employees who prepare these reports are concentrating on performing steps necessary\n\n                                                                                           Page 9\n\x0c                        Disaster Recovery Testing Is Being Adequately Performed,\n                          but Problem Reporting and Tracking Can Be Improved\n\n\n\nto continue recovering the systems. These reports are generally available at the completion of\nthe disaster recovery test and are not revised or updated after the test is completed. Another\nreason why ITAMS tickets were not prepared could be that the template for the Test Case Daily\nAction Report lacks a column for entering the ITAMS ticket number related to a test case\nproblem or for providing a reason why a ticket is not needed. Such a column would facilitate a\nrepetitive process for creating necessary ITAMS tickets. Such a column would also be helpful in\nunderstanding the problems and the corrective actions because it would create the ability to\nassociate a problem in the Test Case Daily Action Report to an ITAMS ticket that might have\nadditional information on the same problem. The Test Case Daily Action Report for the\nJuly 2011 disaster recovery test contained references to some of the ITAMS tickets that had been\nprepared. During the May 2011 disaster recovery test, participants were reminded that if a run or\njob in a test case has a problem, the related ITAMS ticket number should be included in the Test\nCase Daily Action Report. Therefore, it appears that the need to put ITAMS ticket numbers on\nthe Test Case Daily Action Report has been at least anecdotally recognized.\nIf ITAMS tickets are not created and problem reporting is not improved, the risk increases that\ndisaster recovery problems might not be properly identified, resolved, and tracked. In addition,\ndisaster recovery test planners will have less information to review when they begin planning for\nthe next disaster recovery test.\n\nReport recommendations to address some disaster recovery test problems are\nnot closely or formally monitored\nTreasury Directive Publication 85-01, Treasury Information Technology Security Program,\nrequires that bureaus have a process to track information technology security weaknesses and\nactions to correct them. Internal Revenue Manual 10.8.60, Information Technology Security,\nInformation Technology Disaster Recovery Policy and Guidance,8 requires that the IRS track and\ndocument findings and lessons learned and ensure that corrective action plans are implemented\nand findings are resolved.\nThe Cybersecurity organization does not closely or formally track the completion of all of the\nrecommendations it makes in disaster recovery test Executive Overview Reports. The most\nserious recommendations are entered into the Plan of Action and Milestones tracking system, but\nless serious recommendations are not entered into a tracking system. Many of the\nrecommendations in the Executive Overview Reports are not Plan of Action and Milestone type\nrecommendations. Weekly meetings are held between Enterprise Operations and Cybersecurity\norganization staff, at which time they can discuss the recommendations and long-term\nissues/problems.\n\n\n\n\n8\n    IRM 10.8.60 (Jun. 1, 2009).\n                                                                                         Page 10\n\x0c                   Disaster Recovery Testing Is Being Adequately Performed,\n                     but Problem Reporting and Tracking Can Be Improved\n\n\n\nThe Cybersecurity organization does not have a template for tracking the completion of some\ndisaster recovery test corrective actions nor a matching standardized process to ensure that\ntracking is implemented continuously and on a repeatable basis.\nWhen all disaster recovery test recommendations are not formally tracked, the risk increases that\ncorrective actions may not be adequately addressed. Without a formal tracking template and\nprocess, the resources, time periods, and current status of the corrective actions might not be\nadequately defined and accomplished.\nManagement Action: During the course of the audit, we informed Cybersecurity organization\nstaff of the need for an additional worksheet for tracking corrective actions. Cybersecurity staff\ndeveloped a worksheet for this purpose, provided the worksheet to us for our review, and made\nmodifications based on our suggestions. The worksheet was also vetted at a weekly disaster\nrecovery test collaboration meeting, and the Cybersecurity organization staff plans to populate\nthe worksheet with past test recommendations and update it during weekly meetings.\n\nRecommendations\nRecommendation 1: The Associate Chief Information Officer, Cybersecurity, should revise\nthe Test Case Daily Action Report template to include an ITAMS/KISAM ticket column that\nwould require the entry of the ITAMS/KISAM number for all problems reported on the Test\nCase Daily Action Report or a reason why an ITAMS/KISAM ticket is not required.\n       Management\xe2\x80\x99s Response: The IRS agreed with our recommendation. The IRS\n       revised the Test Case Daily Action Report to include columns indicating run interruption,\n       ITAMS/KISAM ticket number, and a reason why an ITAMS/KISAM ticket is not\n       required.\nRecommendation 2: The Associate Chief Information Officer, Cybersecurity, should create a\nprocess for reviewing the completeness of ITAMS/KISAM tickets and Test Case Daily Action\nReports immediately after the disaster recovery test is completed so that the test staff can provide\nany missing information before reporting back to their regular duties.\n       Management\xe2\x80\x99s Response: The IRS agreed with our recommendation. The IRS\n       developed a process for reviewing the completeness of ITAMS/KISAM tickets and Test\n       Case Daily Action Reports.\n\n\n\n\n                                                                                           Page 11\n\x0c                         Disaster Recovery Testing Is Being Adequately Performed,\n                           but Problem Reporting and Tracking Can Be Improved\n\n\n\n                                                                                       Appendix I\n\n            Detailed Objective, Scope, and Methodology\n\nThe overall objective of this review was to observe IRS disaster recovery testing to determine\nwhether the IRS is adequately testing its capability to recover major computer systems from one\nComputing Center1 to another and whether systems can be successfully recovered. To\naccomplish our objective, we:\nI.         Obtained and reviewed guidance and criteria on disaster recovery testing.\nII.        Determined if the IRS adequately planned the July, August, and October 2011 disaster\n           recovery tests.\n           A. Obtained and became familiar with the IRS Disaster Recovery Test Plan.\n           B. Determined if the Disaster Recovery Test Plan was adequately completed and that\n              planned test cases were properly created in the Test Case Daily Action Reports.\n           C. Determined which computers and storage at the recovery site were used in the\n              recovery test to ensure that disaster recovery testing was conducted in as close to an\n              operational environment as possible.\n           D. Obtained and reviewed disaster recovery planning documents for the CADE 2.\nIII.       Observed the August and October 2011 disaster recovery tests to review disaster recovery\n           testing procedures.\n           A. Observed during testing whether the Disaster Recovery Test Plan and planned test\n              cases were followed and whether ITAMS tickets were prepared for problems\n              encountered in performing the test cases.\n           B. Determined if any part of the Disaster Recovery Test Plan or planned test cases were\n              not tested as planned based on observations, meeting attendance, and reports issued\n              after the test.\n           C. Observed whether various disaster recovery testing procedures, instructions, and\n              requirements were followed.\n\n\n\n\n1\n    See Appendix IV for a glossary of terms.\n                                                                                              Page 12\n\x0c                   Disaster Recovery Testing Is Being Adequately Performed,\n                     but Problem Reporting and Tracking Can Be Improved\n\n\n\nIV.    Observed disaster recovery tests and reviewed disaster recovery testing reports for the\n       July and August 2011 disaster recovery tests to determine if systems were successfully\n       recovered.\n       A. Determined if recovered systems were tested to the point that users could use the\n          system, jobs could be run, and other systems could exchange data with them.\n       B. Determined if any systems were recovered but not within stated recovery time\n          objectives.\n       C. Determined if any systems were recovered but with problems (other than not meeting\n          recovery time objectives).\n       D. Determined if any systems were not completely recovered and why.\nV.     Reviewed IRS reports on the results of disaster recovery testing for the July and\n       August 2011 disaster recovery tests.\n       A. Determined if a debriefing was held at the end of the disaster recovery test.\n       B. Obtained and reviewed IRS reports for evaluating test results and identifying\n          weaknesses and corrective actions to improve IRS preparedness. These reports\n          included the Test Case Daily Action Report, Detailed Daily Observation Report,\n          Vulnerabilities Matrix, and Executive Overview Report.\n       C. Determined if the reports appeared to cover all the key issues or problems that we\n          learned of during our observations of the test, attendance at meetings, or otherwise\n          reported in disaster recovery reporting documentation.\n       D. Determined the IRS managers and executives who these reports were presented to.\nInternal controls methodology\nInternal controls relate to management\xe2\x80\x99s plans, methods, and procedures used to meet their\nmission, goals, and objectives. Internal controls include the processes and procedures for\nplanning, organizing, directing, and controlling program operations. They include the systems\nfor measuring, reporting, and monitoring program performance. We determined the following\ninternal controls were relevant to our audit objective: the Cybersecurity organization\xe2\x80\x99s policies,\nprocedures, and practices for planning, conducting, and reporting on disaster recovery tests. We\nevaluated these controls by interviewing staff of the Cybersecurity organization, observing\ndisaster recovery tests, and reviewing plans and reports the IRS prepared on its disaster recovery\ntests.\n\n\n\n\n                                                                                           Page 13\n\x0c                  Disaster Recovery Testing Is Being Adequately Performed,\n                    but Problem Reporting and Tracking Can Be Improved\n\n\n\n                                                                              Appendix II\n\n                 Major Contributors to This Report\n\nAlan R. Duncan, Assistant Inspector General for Audit (Security and Information Technology\nServices)\nDanny Verneuille, Director\nCarol Taylor, Audit Manager\nMyron Gulley, Acting Audit Manager\nRichard Borst, Senior Auditor\nChinita Coates, Auditor\nAnthony Morrison, Program Analyst\n\n\n\n\n                                                                                     Page 14\n\x0c                 Disaster Recovery Testing Is Being Adequately Performed,\n                   but Problem Reporting and Tracking Can Be Improved\n\n\n\n                                                                        Appendix III\n\n                        Report Distribution List\n\nCommissioner C\nOffice of the Commissioner \xe2\x80\x93 Attn: Chief of Staff C\nDeputy Commissioner for Operations Support OS\nDeputy Chief Information Officer for Operations OS:CTO\nAssociate Chief Information Officer, Applications Development OS:CTO:AD\nAssociate Chief Information Officer, Cybersecurity OS:CTO:C\nAssociate Chief Information Officer, Enterprise Operations OS:CTO:EO\nAssociate Chief Information Officer, Modernization Program Management Office OS:CTO:MP\nDirector, Security Risk Management OS:CTO:C:SRM\nChief Counsel CC\nNational Taxpayer Advocate TA\nDirector, Office of Legislative Affairs CL:LA\nDirector, Office of Program Evaluation and Risk Analysis RAS:O\nOffice of Internal Control OS:CFO:CPIC:IC\nAudit Liaison: Director, Risk Management Division OS:CTO:SP:RM\n\n\n\n\n                                                                               Page 15\n\x0c                   Disaster Recovery Testing Is Being Adequately Performed,\n                     but Problem Reporting and Tracking Can Be Improved\n\n\n\n                                                                             Appendix IV\n\n                            Glossary of Terms\n\nTerm                       Definition\nBatch Processing           The execution of a series of programs or jobs on a computer with\n                           minimal human interaction.\nComputing Center           IRS sites that support tax processing and information management\n                           through a data processing and telecommunications infrastructure.\nCustomer Account Data      The next step in the IRS\xe2\x80\x99s information technology modernization\nEngine 2                   efforts. The CADE 2 will provide faster refunds for millions of\n                           individual taxpayers and faster payment postings, account updates,\n                           and taxpayer notices. The CADE 2 will be implemented in a\n                           phased approach.\nCybersecurity              Manages the IRS\xe2\x80\x99s Information Technology Security program. It\nOrganization               is responsible for ensuring compliance with Federal statutory,\n                           legislative, and regulatory requirements governing measures to\n                           assure the confidentiality, integrity, and availability of IRS\n                           electronic systems, services, and data. It is within the\n                           Modernization and Information Technology Services\n                           organization.\nEnterprise Life Cycle,     A structured business system development method that requires\nMilestone 4B               the preparation of specific work products during different phases\n                           of the development process. Enterprise Life Cycle Milestone 4B\n                           is the completion of the System Development Phase, which is the\n                           first phase after the Design Phase. After Milestone 4B, the\n                           System Deployment Phase begins.\nEnterprise Operations      Provides server and mainframe computing services for all IRS\nOrganization               business entities and taxpayers.\nFiling Season              The period from January through mid-April when most individual\n                           income tax returns are filed.\nFiscal Year                A 12-consecutive-month period ending on the last day of any\n                           month, except December. The Federal Government\xe2\x80\x99s fiscal year\n                           begins on October 1 and ends on September 30.\n\n                                                                                     Page 16\n\x0c                     Disaster Recovery Testing Is Being Adequately Performed,\n                       but Problem Reporting and Tracking Can Be Improved\n\n\n\n\nTerm                         Definition\nFunctional Exercises         Exercises in which recovery personnel execute their roles in a\n                             simulated operational environment. Functional exercises involve\n                             retrieving, loading, and validating backup tapes and files.\nIndividual Master File       The IRS database that maintains transactions or records of\n                             individual tax accounts.\nNational Institute of        A part of the Department of Commerce that is responsible for\nStandards and Technology     developing standards and guidelines for providing adequate\n                             information security for all Federal Government agency\n                             operations and assets.\nOffice of Management and     The office within the Executive Office of the President that helps\nBudget                       executive departments and agencies implement the commitments\n                             and priorities of the President.\nPlan of Action and           A management process that outlines security weaknesses\nMilestones                   pertaining to a specific system and the steps that need to be taken\n                             to remediate them. It details resources required to accomplish the\n                             milestones in meeting the task and scheduled completion dates for\n                             the mitigation.\nRecovery Time Objective      The maximum amount of time a system can remain unavailable\n                             before there is an unacceptable impact on other systems or\n                             supported business processes.\nTable Top Exercises          Exercises that are discussion based and take place in a classroom\n                             setting. Participants use disaster recovery plans to discuss how\n                             they would respond to a disruption scenario.\n\n\n\n\n                                                                                          Page 17\n\x0c     Disaster Recovery Testing Is Being Adequately Performed,\n       but Problem Reporting and Tracking Can Be Improved\n\n\n\n                                                    Appendix V\n\nManagement\xe2\x80\x99s Response to the Draft Report\n\n\n\n\n                                                          Page 18\n\x0cDisaster Recovery Testing Is Being Adequately Performed,\n  but Problem Reporting and Tracking Can Be Improved\n\n\n\n\n                                                     Page 19\n\x0c'